Introdution

Airbnb not only has changed the possibilites of travel and ways of living, but also brought new business potetials. We are interested in exploring the data generated from Airbnb, analyzing the data to find interesting facts about airbnb listing in New York. We hope we could generate some useful insights to provide guidances for customers and business suggestions for hosts.

Get and Prepare Data

We get our data from Inside Airbnb. The data source provides a dataset of information from airbnb. We use the most recent (Sep, 2019) dataset for New York. The data is not cleaned, so we need to spend some time to tidy it.

The original dataset has 106 variables. We just select the variables relavent to our analysis. Additionally, the original dataset has 48377 listinsg. We define an active list as the onces received reviews within the past 12 months and only focus on these listings. Also we only includes data having price larger than zero, since we found that occurence of zero in price may due to error in data colleting.

For the cleaning fee (cleaning_fee) and security deposit (security_deposit), it is intuitive to replace the missings by 0. There are small amount of other variables are missing, we simply exclude the listings. It might be also due to collection error of the original data.

Finally, there are 28098 listings and 62 variables for our analysis. We are interested in a few areas including Price, Reviews, Room Type, Host Response Rate, etc. Our analysis answered some interesting questions about Airbnb listings in New York.

Analysis

The majority of listings in our data are among Manhattan and Brooklyn. The closer to the new york city downtown area, the denser the listings.

Price

Booking price is always an important factor both customers and hosts care about. In this section, we want to explore the facts of airbnb booking price in the market of New York.

How’s Booking Price Distributed?

The first plot is a general plot of distribution of price, we could see that it is skewed right with very long tail. Thus, we do a log 10 transformation so that we could have better visualization. The distribution shows that the median of price is 100.

The boxplot in the following also shows that most of the listings have a price under 200 dollars. There are some listings have ridiculous high prices. What are these listings? Why are they so expensive? Let’s further explore these lisitngs with high prices.

What Makes Some Listings Most Expensive?

We foucus on the listings with price more than $600. We could see that most of the listings belongs to Manhattan area, which is reasonable. Futhermore, most of the listings with high price are in Midtown, which is also not unexpected, because Midtown is the central portion of Manhattan.

Moreover, we find the room type of most of these listings with high price is entire home/apt and the number of bedrooms are about 2-4. This result may explain the high price of these listings. Booking a big apratment in Midtown of Manhattan is reasonable to be expensive.

Does Location Make a Difference?

Based on the price distribution above, we are interested in how prices are affected by different locations. We plotted the below push pin map which shows the price distribution among new york districts.

Most of the listings are among $100-$250 per night and the closer we are to new york downtown, the higher the price. The purple points indicate expensive listings above $250 per night. We can rarely see listings that cost more than $500 per night.

Indeed, price is largely affected by location. To better illustrate, we plotted the average price per night for the five neighborhoods.

We can see that Bronx, Staten Island, Queens are almost the same. Brooklyn and Manhattan are more expensice by about $50 to $100 per night, which is a large amount considering our majority of the prices are around $100 per night.

The price distribution for different neighborhoods share the same shape. They have relatively the same variance but different mean, which corresponds to our previous observation.

Are Deposite and Cleaning Fee Proportional to Price?

When we are browing through different listings, the price that Airbnb site displays does not include the deposite needed or the cleaning fee. We are interested in whether the deposit and cleaning fee are proportional to the listing price because intuitively we think they are.

There is only a few of listings with 0 cleaning fees and almost all of them are within $200, which is a pretty high range considering our median listing price per night is $100. There’s only 3% of the listings do not require security deposit and surprisingly there are about 0.25% (about 7000 listings) with security deposit of $500.

We use scatter plots to explore the relationship between price and cleaning fee and security deposit.

We did not observe any obvious pattern with price vs cleaning fee or price vs security deposit. For the scatterplot of price vs security deposit, there are some horizontal lines because security deposit tend to be set to integers so the “lines” appear because they are discrete data.

Does Room Type Affect Price?

Room type is also an important factor customers care about. There are four room types in Airbnb market: entire home/apt, private room, shared room, and hotel room.

First, we take a look at the price distribution for different room type. We could see that entire home/apt and hotel/room have realative higher price than private room and shared room, which is reasonable. Additionally, shared room has realtive lowest price in the market. These findings does not violate our common sense. We continue exploring the price distribution for different room type in four neighbourhoods.

From the graph, we could find that in Manhattan, hotel room has the highest average price. Also, we could see that entire room/apt always much higher than private room and shared room. The price for entire room/apt is relatively very high in Manhattan. Therefore, if customers has limited budgets, it is not wise to look for hotels and entire home/apt in Manhattan. For shared room, Brooklyn and Bronx has the lowest average price. For entire home/apt and private room, Staten Island has the lowest average price.

Surprisingly, at such an expensive living area like Manhattan, lots of listings are “Entire Home/Apt”. It’s true that Manhattan has the most expensive listings but they actually also have relative good qualities (instead of being all shared rooms and small private rooms).

Does Property Type Affect Price?

We use a stacked bar chart to show the proportion of different property type of listings across five neiborhoods.

Most of the listings are Apartments, followed by House. Manhattan has the most aprtments and Staten Island has the most House. Interestingly, Manhatton barely has any Houses, which makes sense because Manhattan has super high population density and has a lot more Apartments in order to hold more people. We can also observe that Manhatton, Brooklyn and Queens listings have a bigger variety in property types, including Loft and even Townhouse.

As observed before, the majority are Apartments and Hotels. The closer to the center of the New York downtown area, the more apartments and less houses we have.

Are There any Correlation Between Price and Other Variables?

There are other variables may be important factors that affects price, such as number of reveiws, number of minimum night, host response rate, and review scores rating. However, from the pairwise scatter plot, we could see that price actually does not have strong correlation with these variables. The price is set by hosts and not every host have multiple financial and marketing information. Additionally, not all the host’s purpose is making money, so price may not reflect actual market values. Price setting is individual subjective. Therefore, we could see that a high price does not guarantee a good services.

Reviews

Analyzing reviews can provide customers’ opinions of valuaing the quality of airbnb services. Customers can rate an Airbnb stay in six categories: cleanliness, accuracy, checking, communication, location, and value which we think represent an overall experience of the stay.

How to Better Understand Customer Reviews?

It is useful to explore which category affects the “value rating” the most because we see high “value” being that customers have the best experience during their stay. Knowing what customers’ care can help hosts to understand what they can focus on improving to provide better service and attract more customers.

We plotted this bar chart to represent the full score percentage in each of the scoring categories. We can see that about 80% of the listings has full score in cleaning and checkin, meaning most of the customers are most satisfied with these two categories. It’s worth noticing that the least customers tend to give full rating score to “Cleaning” category, meaning this might be one factor that hosts in New York need to pay more attention to.

Note that this is the percentage of full “value” score given full score in each of these scoring categories. Therefore, the higher means customers are satisfied with this category along with “value” category, which implies they have better overall experience during their stay.

In addition to this, accurary tend to affect how customer evaluate the “value” of their trip. Combining these two, we can say that “Cleaning” and “Accuracy” tend to have higher impact on customer’s overall experience of their trip.

Therefore, we suggest that hosts should pay attention to cleaning of their rooms and they should not give exaggerated descriptions to their listings.

Interactive Map

We implemented a Shiny app for Interactive New York Airbnb Listings Map. The price shown on the Airbnb listings is just base price per night and the guests included varies from listing to listing. To be compariable, we calculated the price per person per night. This Shiny app is able to faciliate exploration of the listings from potential custerms’ point of view, by filtering how many person to stay, the price range, room type and whether the host is a superhost. The box plot will provide a summary of the filtered listings by neighbourhood. By clicking on the map, there will a popup with some information about the picked listing and a link for further investigation. The users should be aware that there are additional charges of cleaning fee per stay for some listings. On top of all above, there is a 13% service charge from Airbnb!

Conclusion

In this project, we analyzed the airbnb listings in New York on September, 2019. Our data shows that most of the listings are among Manhattan and Brooklyn. We explored some interesting facts about the airbnb listings price. Most of the listings have price under $250 with median $100. However, there are some listings have extremely high price (even $1000 per night). Theses listings are entire home/apt with more than one bedrooms in Manhattan.

We confirmed that locations is an important factor that affect price. The closer to new york downtown, the higher the price. The average price for Manhattan is almost as twice as other neighbourhoods.

For those listings with cleaing fee and security deposit, these fees are relatively high comparing to our listing price per night. We did not see any obvious correlation between price and cleaning fee or price and security deposit.

Price is also affected by room types and property types. “Entire Apartment” and “Hotel Room” are more expensive than “Private” and “Shared Room”, especially in Manhatton. We also notice that lots of the listing in Manhatton are “Entire Home” instead of “Shared Rooms”. A majority of the listings in Manhatton are “Apartment”. We think there’s a link between the fact that Manhatton tend to have lots of “Apartment” and “Entire Home” room types because these are small apartments and it’s easier to rent the whole room.

Regarding customer reviews, we found that the value of the trip tends to be highly affected by “Cleaniness” of room and “Accurary” of the listings.

In the last interactive section, customers should be aware of the difference between the listing price on Airbnb and the actual price per person, which is a major feature in our map.